Weighted logistic regression for large-scale imbalanced and rare events data

نویسندگان

  • Maher Maalouf
  • Mohammad Siddiqi
چکیده

Latest developments in computing and technology, along with the availability of large amounts of raw data, have led to the development of many computational techniques and algorithms. Concerning binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. Logistic Regression (LR) is a powerful classifier. The combination of LR and the truncated-regularized iteratively re-weighted least squares (TR-IRLS) algorithm, has provided a powerful classification method for large data sets. This study examines imbalanced data with binary response variables containing many more non-events (zeros) than events (ones). It has been established in the literature that these variables are difficult to predict and explain. This research combines rare events corrections to LR with truncated Newton methods. The proposed method, Rare Event Weighted Logistic Regression (RE-WLR), is capable of processing large imbalanced data sets at relatively the same processing speed as the TR-IRLS, however, with higher accuracy. 2014 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust weighted kernel logistic regression in imbalanced and rare events data

Recent developments in computing and technology, along with the availability of large amounts of raw data, have contributed to the creation of many effective techniques and algorithms in the fields of pattern recognition and machine learning. The main objectives for developing these algorithms include identifying patterns within the available data or making predictions, or both. Great success h...

متن کامل

Mine Classification with Imbalanced Data

In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. Traditional classification approaches usually ignore this class imbalance, causing performance to suffer accordingly. In contrast, the algorithm infinitely imbalanced logistic regression (IILR) algorithm explicitly addresses class imbalance in its formulation. This p...

متن کامل

Infinitely Imbalanced Logistic Regression

In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. In this paper we consider the infinitely imbalanced case where one class has a finite sample size and the other class’s sample size grows without bound. For logistic regression, the infinitely imbalanced case often has a useful solution. Under mild conditions, the in...

متن کامل

Logistic Regression for Extremely Rare Events

Objectives: The quantitative analysis of extremely rare events and factors in uencing these events poses some di culties. The objective of my paper is to evaluate logistic regression for events millions times more rare than non-events. Methods: Based on former theoretical and experimental results a simulation study is conducted. A specialized software is developed and supplied with this paper. ...

متن کامل

Hybrid Method of Logistic Regression and Data Envelopment Analysis for Event Prediction: A Case Study (Stroke Disease)

Abstract Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. Many mathematical modeling has been developed and used for prediction, and in some cases, they have been found to be very strong and reliable. This paper studies different mathematical and statistical approaches for events prediction. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2014